CHARACTERIZATION OF THE YEAST TRANSCRIPTOME 



This application is a continuation-in-part of co-pending application Serial No. 
09/012,031 filed January 22, 1998, the disclosure of which is incorporated by reference 
herein. This invention was made with government support under CA57345 awarded 
by the National Institutes of Health. The government has certain rights in the 
invention. 

TFCHNICAT FTEI.D OF THE INVENTION 

This invention is related to the characterization of the expressed genes of the 
yeast genome. More particularly, it is related to the identification and use of previously 
unrecognized genes. 

BACKGROUND OF THE INVENTION 

It is by now axiomatic that the phenotype of an organism is largely determined 
by the genes expressed within it. These expressed genes can be represented by a 
"transcriptome," conveying the identity of each expressed gene and its level of 
expression for a defined population of cells. Unlike the genome, which is essentially 
a static entity, the transcriptome can be modulated by both external and internal 
factors. The transcriptome thereby serves as a dynamic link between an organism's 
genome and its physical characteristics. 

The transcriptome as defined above has not been characterized in any eukaryotic 
or prokaryotic organism, largely because of technological limitations. However, some 
general features of gene expression patterns were elucidated two decades ago through 
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RNA-DNA hybridization measurements (Bishop et al, 1974; Hereford and Rosbash, 
1 977). In many organisms, it was thus found that at least three classes of transcripts 
could be identified, with either high, medium, or low levels of expression, and the 
number of transcripts per cell were estimated (Lewin, 1980). These data of course 

5 provided little information about the specific genes that were members of each class. 

Data on the expression levels of individual genes have accumulated as new genes were 
discovered. However, in only a few instances have the absolute levels of expression 
of particular genes been measured and compared to other genes in the same cell type. 
Description of any cell's transcriptome would therefore provide new 

10 information useful for understanding numerous aspects of cell biology and 

biochemistry. 

STTMMARY OF THF INVENTION 

It is an object of the present invention to provide isolated DNA molecules and 
methods of using such molecules to affect the cell cycle and identify candidate drugs. 
1 5 These and other objects of the invention are achieved by providing the art with one 

or more of the embodiments described below. 

According to one embodiment of the invention an isolated DNA molecule is 
provided. It comprises a coding sequence of a yeast gene selected from the group 
consisting of NORF genes comprising a SAGE tag as shovm in SEQ ID NOS:67-81 1 . 
20 According to another embodiment of the invention a method of using NORF 

genes is provided. The method is for affecting the cell cycle of a cell. The method 
comprises the step of administering to a cell an isolated DNA molecule comprising 
a coding sequence of a NORF gene whose expression varies by at least 10% between 
any two phases of the cell cycle selected from the group consisting of log phase, S 

25 phase, and G2/M. 

In yet another embodiment of the invention a method for screening candidate 
antifungal drugs is provided. The method comprises the steps of contacting a test 
substance with a yeast cell and monitoring expression of a NORF gene whose 
expression varies by at least 10% between any two phases of the cell cycle selected 

30 from the group consisting of log phase, S phase, and G2/M, wherein a test substance 
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which modifies the expression of the yeast gene is a candidate antifungal drug. 

In still another embodiment of the invention a method for identifying human 
genes which are involved in cell cycle progression is provided. The method comprises 
the step of contacting human DNA with a probe which comprises at least 14 
5 contiguous nucleotides of a NORF gene whose expression varies by at least 10% 

between any two phases of the cell cycle selected from the group consisting of log 
phase, S phase, and G2/M. A human DNA sequence which hybridizes to the probe 
is identified as a sequence of a candidate human gene which is involved in cell cycle 
progression. 

10 The present invention provides probes which comprise at least 14 contiguous 

nucleotides of a NORF gene comprising a SAGE tag as shown in SEQ ID NOS:67- 
811. 

The invention also provides an array of probes on a solid support. At least one 
probe in the array comprises at least 14 contiguous nucleotides of a NORF gene 
1 5 comprising a SAGE tag as shown in SEQ ID NOS:67-81 1 . 

Still another embodiment of the invention is a method of identifying a candidate 
drug as a member of a class of drugs having a characteristic effect on gene expression 
in a yeast cell. A yeast cell is contacted with a candidate drug. Expression of at least 
one NORF gene whose expression is affected by the class of drugs is monitored in the 
20 yeast cell. Detection of a difference in expression of the at least one NORF gene 

relative to expression in the absence of the candidate drug identifies the candidate 
drug as a member of the class of drugs. 

These and other embodiments of the invention which will be apparent to those 
of skill in the art upon reading the detailed disclosure provided below, make available 
25 to the art hitherto unrecognized genes, and information about the expression of genes 

globally at the organismal level. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1. Schematic of SAGE Method and Genome Analysis. In applying 
SAGE to the analysis of yeast gene expression patterns, the 3' most Nlalll site was 
30 used to define a unique position in each transcript and to provide a site for ligation of 
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a linker with a BsmFI site. The type lis enzyme BsmFI, which cleaves a defined 
distance from its non-palindromic recognition site, was then used to generate a 1 5bp 
SAGE tag (designated by the black arrows), which includes the Nlalll site. 
Automated sequencing of concatenated SAGE tags allowed the routine identification 
5 of about a thousand tags per 36-lane sequencing gel. Once sequenced, the abundance 

of each SAGE tag was calculated, and each tag was used to search the entire yeast 
genome to identify its corresponding gene. The lower panel shows a small region of 
Chromosome 15. Gray arrows indicate all potential SAGE tags (Nlalll sites) and 
black arrows indicate 3' most SAGE tags. The total number of tags observed for each 

1 0 potential tag is indicated above (+ strand) or below (- strand) the tag. As expected, the 

observed SAGE tags were associated with the 3' end of expressed genes. 

Figure 2, Sampling of Yeast Gene Expression. Analysis of increasing amounts 
of ascertained tags reveals a plateau in the number of unique expressed genes. 
Triangles represent genes with known functions, squares represent genes predicted on 

1 5 the basis of sequence information, and circles represent total genes. 

Figure 3. Virtual Rot. (A) Abundance Classes in the Yeast Transcriptome. 
The transcript abundance is plotted in reverse order on the abscissa, whereas the 
fraction of total transcripts with at least that abundance is plotted on the ordinate. The 
dotted lines identify the three components of the curve, 1, 2, and 3. This is analogous 

20 to a Rot curve derived from reassociation kinetics where the product of initial RNA 

concentration and time is plotted on the abscissa, and the percent of labeled cDNA 
that hybridizes to excess mRNA is plotted on the ordinate. (B) Comparison of 
Virtual Rot and Rot Components. Transitions and data from virtual Rot components 
were calculated from the data in Figure 3A, while data for Rot components were 

25 obtained from Hereford and Rosbash, 1977. 

Figure 4. Chromosomal Expression Map for S, cerevisiae. Individual yeast 
genes were positioned on each chromosome according to their open reading frame 
(ORF) start coordinates. Abundance levels of tags corresponding to each gene are 
displayed on the vertical axis, with transcription from the + strand indicated above the 

30 abscissa and that from the - strand indicated below. Yellow bands at ends of the 

expanded chromosome represent telomeric regions that are undertranscribed (see text 
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for details). 

Figure 5, Northern Blot Analysis of Representative Genes. TDH2/3, TEFl/2 
and NORFl, are expressed relatively equally in all three states (lane 1, G2/M arrested; 
lane 2, S phase arrested; lane 3, log phase), while RNR4, RNR2 , and NORF5 are 
5 highly expressed in S-phase arrested cells. The expression level observed by SAGE 

( number of tags) is noted below each lane and was highly correlated with quantitation 
of the Northern blot by Phosphorlmager analysis (r'=0.97). 



TABLE LEGENDS 

Table L Highly Expressed Genes. Tag represents the 10 bp SAGE tag adjacent 

1 0 to the Nlalll site; Gene represents the gene or genes corresponding to a particular tag 

(multiple genes that match unique tags are from related families, with an average 
identity of 93%); Locus and Description denote the locus name and functional 
description of each ORE, respectively; Copies/cell represents the abundance of each 
transcript in the SAGE library, assuming 15,000 total transcripts per cell and 60,633 

1 5 ascertained transcripts. 

Table 2. Expression of Putative Coding Sequences. Table column headings are 
the same as for Table 1 . 

Table 3. Expression of the most abundant NORF genes. SAGE Tag, Locus, 
and Copies/cell are the same as for Table 1; Chr and Tag Pos denote the chromosome 

20 and position of each tag; ORE Size denotes the size of the ORE corresponding to the 

indicated tag. In each case, the tag was located within or less than 250 bp 3' of the 
NORF. 

Table 4. Expression of NORF genes. SAGE tag and Copies/cell are the same 
as for Table 1 . Chr and Tag Pos denote the chromosome and position of each tag. 

25 Table 5. Gene expression changes in different cell cycle phases. L denotes log 

phase; S denotes synthesis phase; G2/M denotes the mitotic phase. Tag Sequence 
represents the 10 bp SAGE tag adjacent to the Nlalll site; "ratio L to S" denotes the 
ratio of expression in log phase to expression in synthesis phase; ''ratio S to G2/M'' 
denotes the ratio of expression in synthesis phase to expression in G2/M phase; "ratio 

30 G2/M to L'' denotes the ratio of expression in G2/M to log phase. #DIV/0! indicates 
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an increase in expression from 0; a value of 0 indicates a decrease in expression to 0; 
a value of 1 indicates no change; a value less than 1 indicates a decrease in expression; 
and a value greater than 1 indicates an increase in expression. 

Table 6. Intergenic open reading frames that contain or are adjacent to observed 
5 SAGE tags. Copies/cell represents abundance of each mRNA transcript as in Table 

1. Positive expression level indicates the tag is on the + strand of the chromosome; 
Negative expression level indicates the tag is on the - strand. 



DETAILED DESCRIPTION 

10 It is a discovery of the present invention that certain hitherto unknown genes 

(the NORFs) exist and are expressed in yeast. These genes, as u^ell as other 
previously identified and previously postulated genes, can be used to study, monitor, 
and affect phases of cell cycle. The present invention identifies which genes are 
differentially expressed during the cell cycle. Differentially expressed genes can be 

1 5 used as markers of phases of the cell cycle. They can also be used to affect a change 

in the phase of the cell cycle. In addition, they can be used to screen for drugs which 
affect the cell cycle, by affecting expression of the genes. Human homologs of these 
eukaryotic genes are also presumed to exist, and can be identified using the yeast 
genes as probes or primers to identify the human homologs. 

20 New genes termed NORFs (not previously assigned open reading frames ) have 

been found. They are uniquely identified by their SAGE tags. In addition their entire 
nucleotide sequences are known and publicly available. In general, these were not 
previously identified as genes due to their small size. However, they have now been 
found to be expressed. 

25 Differentially expressed yeast genes are those whose expression varies by a 

statistically significant difference (to greater than 95% confidence level) within 
different growth phases, particularly log phase, S phase, and G2/M. Preferably the 
difference is at least 10%, 25%, 50%, or 100%. In some cases, differentially 
expressed genes are not expressed at detectable levels in one or more cell cycle phases 

30 as determined by SAGE analysis. Genes which have been found to have differential 

expression characteristics include: NORF NM, 2, 4, 5, 6, 17, 25. 27, TEF1/TEF2. 
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EN02, ADHl, ADH2, PGKl, CUPIA/CUPIB, PYKl, YKL056C, YMR116C 
YEL033W, YOR182C, YCR013C, ribonucleotide reductase 2 and 4, and YJR085C. 
Differential expression can be detected by any means known in the art, such as 
hybridization to specific probes or immunological assays. 

Isolated DNA molecules according to the invention contain less than a whole 
chromosome and can be genomic or cDNA, i.e., lacking introns. Isolated DNA 
molecules can comprise a yeast gene or a coding sequence of a yeast gene involved 
in cell cycle progression, such as NORF genes which comprise SAGE tags as shown 
in SEQ ID NOS:67-81 1. Isolated DNA molecules which comprise yeast genes or 
coding sequences of yeast genes comprising SAGE tags as shown in SEQ ID NOS:37- 
12,203 are also isolated DNA molecules of the invention. Isolated DNA molecules 
can also consist of a yeast gene or a coding sequence of a yeast gene which comprises 
a SAGE tag as shown in SEQ ID NOS:37-l 2,203 or 67-81 1. 

Any technique for obtaining a DNA of known sequence may be used to obtain 
isolated DNA molecules of the invention. Preferably they are isolated free of other 
cellular components such as membrane components, proteins, and lipids. They can 
be made by a cell and isolated, or synthesized using PGR or an automatic synthesizer. 
Methods for purifying and isolating DNA are routine and are known in the art. 

To administer yeast genes to cells, any DNA delivery techniques known in the 
art may be used, without limhation. These include liposomes, transfection, mating, 
transduction, transformation, viral infection, electroporation. Vectors for particular 
purposes and characteristics can be selected by the skilled artisan for their known 
properties. Cells which can be used as gene recipients are yeast and other fungi, 
mammalian cells, including humans, and bacterial cells. 

Antifungal drugs can be identified using yeast cells as described herein. 
Expression of a differentially expressed NORF gene can be monitored by any means 
known in the art. When a test substance modifies the expression of such a 
differentially expressed gene, for example by increasing or decreasing its expression, 
it is a candidate drug for affecting the growth properties of fungi and may be useful 
as an antifungal agent. Expression of more than one NORF gene can be monitored. 
For example, expression of 2, 3, 4, 5, 10, 15, 20, 30, 40, 50, 60, 75, 100, 150, 250, 




300, 350, 400, 450, or 500 or more NORF genes can be monitored in single or 
multiple assays. 

Because differentially expressed genes are likely to be involved in cell cycle 
progression, it is likely that these genes are conserved among species. The 
5 differentially expressed NORF genes identified by the present invention can be used 

to identify homologs in humans and other mammals by contacting DNA from these 
mammals with a probe which comprises at least 10 contiguous nucleotides of a 
differentially expressed NORF gene. The DNA can be genomic or cDNA, as is 
known in the art. Means for identifying homologous genes among different species 

10 are well known in the art. Briefly, stringency of hybridization can be reduced so that 

imperfectly matching sequences hybridize. This can be in the context of inter alia 
Southern blots, Northern blots, colony hybridization or PGR. Any hybridization 
technique which is knovm in the art can be used. A DNA sequence which hybridizes 
to the probe is identified as a sequence of a candidate gene which is involved in cell 

15 cycle expression. 

Probes according to the present invention are isolated DNA molecules which 
have at least 10, and preferably at least 12, 14, 16, 18, 20, or 25 contiguous 
nucleotides of a particular NORF gene or other differentially expressed gene. The 
probes may or may not be labeled. They may be used, for example, as primers for 

20 PGR assays, or for detection of gene expression for Southern or Northern blots or in 

situ hybridization. Preferably the probes are immobilized on a solid support. The 
solid support can be any surface to which a probe can be attached. Suitable solid 
supports include, but are not limited to, glass or plastic slides, tissue culture plates, 
microtiter wells, tubes, or particles such as beads, including but not limited to latex, 

25 polystyrene, or glass beads. Any method known in the art can be used to attach the 

a probe to the solid support, including use of covalent and non-covalent linkages, 
passive absorption, or pairs of binding moieties attached respectively to the probe and 
the solid support. 

More preferably, probes are present on an array so that multiple probes can 
30 simultaneously hybridize to a single biological sample. The probes can be spotted 

onto the array or synthesized in situ on the array. See Lockhart et. al.. Nature 
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Biotechnology, Vol. 14, December 1996, "Expression monitoring by hybridization 
to high-density oligonucleotide arrays." A single array contains at least one NORF 
probe, but can contain more than 100, 500 or even 1,000 different probes in discrete 
locations. If desired, one or more NORF probe(s) present on the array can be 
5 nucleotide sequences from a NORF gene which is differentially expressed during the 

cell cycle. 

Genes identified by the present invention which are differentially expressed 
during the cell cycle can also be used to obtain gene expression profiles characteristic 
of the response of yeast genes of a yeast cell to a particular drug or class of drugs. 

10 Classes of drugs of particular interest for which gene expression profiles can be 

generated include those drugs which affect cell cycle or other cell processes, such as 
chemotherapeutic agents. If desired, gene expression profiles characteristic of more 
than one drug of a particular class can be generated and used to make a composite 
gene expression profile. For example, microtubule poison drugs such as vinblastin, 

15 taxol, vincristine, and taxotere can be used to generate gene expression profiles 

characteristic of microtubule poisons. 

To generate a gene expression profile characteristic of a particular drug or class 
of drugs, a yeast cell is contacted with a particular drug or a member of a particular 
class of drugs. Expression of at least one yeast gene is monitored, either before and 

20 after contacting or in the contacted cell and in another yeast cell which has not been 

contacted with the drug. Genes which are monitored can be any yeast gene, including 
NORFS. Preferably, these genes are differentially expressed during the cell cycle. For 
example, yeast genes can be selected from genes comprising the SAGE tags shown 
in Tables 3, 4, 5, and 6 (SEQ ID NOS:67- 12,203). If desired, genes such as NORF 

25 1, 2, 4, 5, 6, 17, 25, or 27, TEF1/TEF2, EN02, ADHl, ADH2, PGKl, 

CUPl A/CUPIB, PYKl, YKL056C, YMRl 16C, YEL033W, YOR182C, YCR013C, 
ribonucleotide reductase 2 and 4, and YJR085C, can be used for monitoring 
alterations in gene expression. 

The expression of any number of these genes, such as 1, 2, 3, 4, 5, 10, 15, 20, 

30 25, 30, 40, 50, 60, 75, 100, 150, 250, 500, 1000, 2000, 3000, 4000, 5000, or 5,500 

genes, can be measured. It is particularly convenient to monitor expression of the 
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differentially expressed genes using nucleic acids which are immobilized on a solid 
support or in an array, such as the gene arrays described above. 

Many genes, particularly cell cycle genes, are likely to be conserved between 
yeast and mammals, including humans. Thus, gene expression profiles characteristic 
5 of a drug or class of drugs can be used to predict the effects of candidate drugs on 

human cells, by identifying the candidate drug as a member of a class of drugs whose 
characteristic gene expression profile is known. The candidate drugs can be 
pharmacologic agents already known in the art or can be compounds previously 
unknown to have any pharmacological activity. The candidate drugs can be naturally 

10 occurring or designed in the laboratory. They can be isolated from microorganisms, 

animals, or plants, and can be produced recombinantly or synthesized by chemical 
methods known in the art. 

The effect of a candidate drug on expression of at least one gene whose 
expression is affected by the class of drugs is monitored. A gene expression profile 

1 5 obtained using the candidate drug which is similar to a gene expression profile for a 

particular drug or class of drugs identifies the candidate drug as a member of that class 
of drugs. 

The effect of modifying particular substituents of a known drug or of a candidate 
drug can be similarly tested. Such methods are useful for determining whether 
20 alterations intended, for example, to increase solubility or absorption of a particular 

drug will have an unintended and possibly deleterious effect on genes which are 
differentially expressed during the cell cycle. 

The above disclosure generally describes the present invention. A more 
complete understanding can be obtained by reference to the following specific 
25 examples which are provided herein for purposes of illustration only, and are not 

intended to limit the scope of the invention. 

EXAMPLE 

Summary 

We have analyzed the set of genes expressed from the yeast genome, herein 
30 called the transcriptome, using serial analysis of gene expression (SAGE). Analysis 
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of 60,633 transcripts revealed 4,665 genes, with expression levels ranging from 0.3 
to over 200 transcripts per cell. Of these genes, 1,981 had known functions, while 
2,684 were previously uncharacterized. Integration of positional information with 
gene expression data allowed the generation of chromosomal expression maps, 
5 identifying physical regions of transcriptional activity, and identified genes that had 

not been predicted by sequence information alone. These studies provide insight into 
global patterns of gene expression in yeast and demonstrate the feasibility of genome- 
wide expression studies in eukaryotes. 

Results 

10 Characteristics and Rationale of SAGE Approach 

Several methods have recently been described for the high throughput 
evaluation of gene expression (Nguyen et al„ 1995; Schena et al, 1995; Velculescu 
et al, 1995). We used SAGE (Serial Analysis of Gene Expression) because it can 
provide quantitative gene expression data without the prerequisite of a hybridization 

1 5 probe for each transcript. The SAGE technology is based on two basic principles 

(Figure 1). First, a short sequence tag (9-11 bp) contains sufficient information to 
uniquely identify a transcript, provided that it is derived from a defined location within 
that transcript. Second, many transcript tags can be concatenated into a single 
molecule and then sequenced, revealing the identity of multiple tags simultaneously. 

20 The expression pattern of any population of transcripts can be quantitatively evaluated 

by determining the abundance of individual tags and identifying the gene 
corresponding to each tag. 



Genome-wide expression 

In order to maximize representation of genes involved in normal growth and cell-cycle 
25 progression, SAGE libraries were generated from yeast cells in three states: log phase, 

S phase arrested and G2/M phase arrested. In total, SAGE tags corresponding to 
60,633 total transcripts were identified (including 20,184 from log phase, 20,034 from 
S phase arrested, and 20,415 from G2/M phase arrested cells). Of these tags, 56,291 
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tags (93%) precisely matched the yeast genome, 88 tags matched the mitochondrial 
genome, and 91 tags matched the 2 micron plasmid. 

The number of SAGE tags required to define a yeast transcriptome depends on 
the confidence level desired for detecting low abundance mRNA molecules. 
5 Assuming the previously derived estimate of 15,000 mRNA molecules per cell 

(Hereford and Rosbash, 1977), 20,000 tags would represent a 1 .3 fold coverage even 
for mRNA molecules present at a single copy per cell, and would provide a 72% 
probability of detecting such transcripts (as determined by Monte Carlo simulations). 
Analysis of 20,184 tags from log phase cells identified 3,298 unique genes. As an 
10 independent confirmation of mRNA copy number per cell, we compared the 

expression level of SUP44/RPS4, one of the few genes whose absolute mRNA levels 
have been reliably determined by quantitative hybridization experiments (Iyer and 
Struhl, 1996), with expression levels determined by SAGE. SUP44/RPS4 was 
measured by hybridization at 75 +/- 10 copies/cell (Iyer and Struhl, 1996), in good 
1 5 accord with the SAGE data of 63 copies/cell, suggesting that the estimate of 1 5,000 

mRNA molecules per cell was reasonably accurate. Analysis of SAGE tags from S 
phase arrested and G2/M phase arrested cells revealed similar expression levels for 
this gene (range 52 to 55 copies/cell), as well as for the vast majority of expressed 
genes. As less than 1% of the genes were expressed at dramatically different levels 
20 among these three states (see below), SAGE tags obtained from all libraries were 

combined and used to analyze global patterns of gene expression. 

Analysis of ascertained tags at increasing increments revealed that the number 
of unique transcripts plateaued at -60,000 tags (Figure 2). This suggested that 
generation of further SAGE tags would yield few additional genes, consistent with the 
25 fact that sixty thousand transcripts represented a four-fold redundancy for genes 

expressed as low as 1 transcript per cell. Likewise, Monte Carlo simulations indicated 
that analysis of 60,000 tags would identify at least one tag for a given transcript 97% 
of the time if its expression level was one copy per cell. 

The 56,291 tags that precisely matched the yeast genome represented 4,665 
30 different genes. This number is in agreement with the estimate of 3,000 to 4,000 

expressed genes obtained by RNA-DNA reassociation kinetics (Hereford and 
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Rosbash, 1977). These expressed genes included 85% of the genes with characterized 
functions (1,981 of 2,340), and 76% of the total genes predicted from analysis of the 
yeast genome (4,665 of 6,121). These numbers are consistent with a relatively 
complete sampling of the yeast transcriptome given the limited number of 
5 physiological states examined and the large number of genes predicted solely on the 

basis of genomic sequence analysis. 

The transcript expression per gene was observed to vary from 0.3 to over 200 
copies per cell. Analysis of the distribution of gene expression levels revealed several 
abundance classes that were similar to those observed in previous studies using 
1 0 reassociation kinetics. A "virtual Rot" of the genes observed by SAGE (Figure 3 A) 

identified three main components of the transcriptome with abundances ranging over 
three orders of magnitude. A Rot curve derived from RNA-cDNA reassociation 
kinetics also contained three main components distributed over a similar range of 
abundances (Hereford and Rosbash, 1977). Although the kinetics of reassociation of 
1 5 a particular class of RNA and cDNA may be affected by numerous experimental 

variables, there were striking similarities between Rot and virtual Rot analyses (Figure 
3B). Because Rot analysis may not detect all transcripts of low abundance (Lewin, 
1980), it is not surprising that SAGE revealed both a larger total number of expressed 
genes and a higher fraction of the transcriptome belonging to the low abundance 
20 transcript class. 

Integration of Expression Information with the Genomic Map 

The SAGE expression data could be integrated with existing positional information 
to generate chromosomal expression maps (Figure 4). These maps were generated 
using the sequence of the yeast genome and the position coordinates of ORFs obtained 

25 from the Stanford Yeast Genome Database. Although there were a few genes that 

were noted to be physically proximal and have similarly high levels of expression, 
there did not appear to be any clusters of particularly high or low expression on any 
chromosome. Genes like histones H3 and H4, which are known to have coregulated 
divergent promoters and are immediately adjacent on chromosome 14 (Smith and 

30 Murray, 1983), had very similar expression levels (5 and 6 copies per cell. 
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respectively). The distribution of transcripts among the chromosomes suggested that 
overall transcription was evenly dispersed, with total transcript levels being roughly 
linearly related to chromosome size (r^ =0.85, data not shown). However, regions 
within 10 kb of telomeres appeared to be uniformly undertranscribed, containing on 
5 average 3.2 tags per gene as compared with 12.4 tags per gene for non-telomeric 

regions (Figure 4). This is consistent with the previously described observations of 
"telomeric silencing" in yeast (Gottschling et a/., 1990). Recent studies have reported 
telomeric position effects as far as 4 kb from telomere ends (Renauld et al, 1993). 

Gene Expression Patterns 

1 0 Table 1 lists the 30 most highly expressed genes, all of which are expressed at 

greater than 60 mRNA copies per cell. As expected, these genes mostly correspond 
to well characterized enzymes involved in energy metabolism and protein synthesis 
and were expressed at similar levels in all three growth states (Examples in Figure 5). 
Some of these genes, including EN02 (McAlister and Holland, 1982), PDCl (Schmitt 
15 et a/., 1983), PGKl (Chambers et al., 1989), PYKl (Nishizawa et al, 1989), and 

ADHl (Denis et al, 1983), are known to be dramatically induced in the glucose-rich 
growth conditions used in this study. In contrast, glucose repressible genes such as 
the GAL1/GAL7/GAL10 cluster (St John and Davis, 1979), and GALS (Bajwa et al, 
1988) were observed to be expressed at very low levels (0.3 or fewer copies per cell). 
20 As expected for the yeast strain used in this study, mating type a specific genes, 

such as the a factor genes {MFAl, MFA2) (Michaelis and Herskowitz, 1988), and 
alpha factor receptor (STE2) (Burkholder and Hartwell, 1 985) were all observed to be 
expressed at significant levels (range 2 to 10 copies per cell), while mating type alpha 
specific genes {MFal, MFa2, STE3) (Hagen et al, 1986; Kurjan and Herskowitz, 
25 1982; Singh et al, 1983) were observed to be expressed at very low levels (<0.3 

copies/cell). 

Three of the highly expressed genes in Table 1 had not been previously 
characterized. One contained an ORF with predicted ribosomal function, previously 
identified only by genomic sequence analysis. Analyses of all SAGE data suggested 
30 that there were 2,684 such genes corresponding to uncharacterized ORFs which were 
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transcribed at detectable levels. The 30 most abundant of these transcripts were 
observed more than 30 times, corresponding to at least 8 transcripts per cell (Table 2). 
The other two highly expressed uncharacterized genes corresponded to ORFs not 
predicted by analysis of the yeast genome sequence (NORF = Nonannotated ORF). 
5 Analyses of SAGE data suggested that there were at least 1 60 NORF genes transcribed 

at detectable levels. The 30 most abundant of these transcripts were observed at least 
9 times (Table 3 and examples in Figure 5). 

Interestingly, one of the NORF genes (NORF5) was only expressed in S phase 
arrested cells and corresponded to the transcript whose abundance varied the most in 

1 0 the three states analyzed (> 49 fold. Figure 5). Comparison of S phase arrested cells 

to the other states also identified greater than 9 fold elevation of the RNR2 and RNR4 
transcripts (Figure 5). Induction of these ribonucleoside reductase genes is likely to 
be due to the hydroxyurea treatment used to arrest cells in S phase (Elledge and Davis, 
1989). Likewise, comparison of G2/M arrested cells identified elevation of RBL2 

1 5 and dynein light chain, both microtubule associated proteins (Archer et aL, 1 995; Dick 

et al., 1996). As with the RNR inductions, these elevated levels seem likely to be 
related to the nocodazole treatment used to arrest cells in the G2/M phase. While 
there were many relatively small differences between the states (for example, NORF I, 
Figure 5), overall comparison of the three states revealed surprisingly few dramatic 

20 differences; there were only 29 transcripts whose abundance varied more than 10 fold 

among the three different states analyzed (Tables 4 and 5). 

A comprehensive analysis for genes was performed using the SAGE 

data. Yeast genome intergenic regions "^ere defined as regions outside annotated 
ORFs or the 500bp region downstream of Annotated ORFs (yeast genome sequence 

x5 and tables of annotated ORFs were obtained from SGD at 

http://genome-www.stanford.edu/Saccharomyces/). Based on sequence analysis a 
total of 9524 putative ORFs of 25-99 amino acids were present in the intergenic 
regions; 510 of these ORFs contain or are adjacent to observed SAGE tags (Table 6). 
Of the 60,633 SAGE tags analyzed, there were 302 unique SAGE tags either within 

30 or adjacent to intergenic ORFs (lOObp upstream or 500bp downstream of the ORF) 

(Table 6). Note that in some cases, more than one NORF contains or is adjacent to the 
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SAGE tag. These tags matched the genome uniquely, were in the correct orientation, 
and were expressed at levels greatW than 0.3 transcript copies per cell. 

The expression level for each NORF shown in Table 6 corresponds to the 
number of mRNA transcript copies per cell. If the expression level is positive it 
means that the tag is on the + strand of the chromosome; if negative, the tag is on the 
- strand of the chromosome. 



Discussion 

Analysis of a yeast transcriptome affords a unique view of the RNA components 

10 defining cellular life. Comparison of gene expression patterns from altered 

physiologic states can provide insight into genes that are important in a variety of 
processes. Comparison of transcriptomes from a variety of physiologic states should 
provide a minimum set of genes whose expression is required for normal vegetative 
growth, and another set composed of genes that will be expressed only in response to 

1 5 specific environmental stimuli, or during specialized processes. For example, recent 

work has defined a minimal set of 250 genes required for prokaryotic cellular life 
(Mushegian and Koonin, 1996). Examination of the yeast genome readily identified 
homologous genes for 196 of these, over 90% of which were observed to be expressed 
in the SAGE analysis. Detailed analyses of yeast transcriptomes, as well as 

20 transcriptomes from other organisms, should ultimately allow the generation of a 

minimal set of genes required for eukaryotic life. 

Like other genome-wide analyses, SAGE analysis of yeast transcriptomes has 
several potential limitations. First, a small number of transcripts would be expected 
to lack an Nlalll site and therefore would not be detected by our analysis. Second, our 

25 analysis was limited to transcripts found at least as frequently as 0.3 copies per cell. 

Transcripts expressed in only a minute fraction of the cell cycle, or transcripts 
expressed in only a fraction of the cell population, would not be reliably detected by 
our analysis. Finally, mRNA sequence data are practically unavailable for yeast, and 
consequently, some SAGE tags cannot be unambiguously matched to corresponding 

30 genes. Tags which were derived from overlapping genes, or genes which have 

unusually long 3' untranslated regions may be misassigned. Increased availability of 
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3' UTR sequences in yeast mRNA molecules should help to resolve the ambiguities. 

Despite these potential limitations, it is clear that the analyses described here 
furnish both global and local pictures of gene expression, precisely defined at the 
nucleotide level. These data, like the sequence of the yeast genome itself, provide 

5 simple, basic information integral to the interpretation of many experiments in the 

future. The availability of mRNA sequence information from EST sequencing as 
well as various genome projects, will soon allow definition of transcriptomes from a 
variety of organisms, including human. The data recorded here suggest that a 
reasonably complete picture of a human cell transcriptome will require only about 10 - 

10 20 fold more tags than evaluated here, a number well within the practical realm 

achievable with a small number of automated sequencers. The analysis of global 
expression patterns in higher eukaryotes is expected, in general, to be similar to those 
reported here for S. cerevisiae. However, the analysis of the transcriptome in different 
cells and from different individuals should yield a wealth of information regarding 

1 5 gene function in normal, developmental, and disease states. 

Experimental Procedures 
Yeast cell culture 

The source of transcripts for all experiments was S. cerevisiae strain YPH499 
{MATa ura3-52 lys2-80l ade2-10l leu2-AJ his3-A200 trpl-A63) (Sikorski and 

20 Hieter, 1989). Logarithmically growing cells were obtained by growing yeast cells to 

early log phase (3x10^ cells/ml) in YPD (Rose et al, 1990) rich medium (YPD 
supplemented with 6 mM uracil, 4.8 mM adenine and 24 mM tryptophan) at 30°C. 
For arrest in the Gl/S phase of the cell cycle, hydroxyurea (0.1 M) was added to early 
log phase cells, and the culture was incubated an addifional 3.5 hours at 30°C. For 

25 arrest in the G2/M phase of the cell cycle, nocodazole ( 1 5 ^g/ml) was added to early 

log phase cells and the culture was incubated for an additional 100 minutes at 30 °C. 
Harvested cells were washed once with water prior to freezing at -70°C. The grov^h 
states of the harvested cells were confirmed by microscopic and flow cytometric 
analyses (Basrai et a/., 1996). 
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SAGE protocol 

The SAGE method was performed as previously described (Velculescu et al., 
1995; Kinzler et al, U.S. Patents 5,866,330 and 5,695,937), with exceptions noted 
below. PolyA RNA was converted to double-stranded cDNA with a BRL synthesis 

5 kit using the manufacturer's protocol except for the inclusion of primer biotin-5'-T|8- 

3'. The cDNA was cleaved with Nlalll (Anchoring Enzyme). As Nlalll sites were 
observed to occur once every 309 base pairs in three arbitrarily chosen yeast 
chromosomes (1,5, 10), 95% of yeast transcripts were predicted to be detectable with 
a Nlalll-based SAGE approach. After capture of the 3' cDNA fragments on 

1 0 streptavidin coated magnetic beads (Dynal), the bound cDNA was divided into two 

pools, and one of the following linkers containing recognition sites for BsmFI was 
ligated to each pool: Linker 1, 5'- 

TTTGGATTTGCTGGTGCAGTACAACTAGGCTTAATAGGGACATG-3' (SEQ 
ID NO: 1).5'-TCCCTATTAAGCCTAGTTGTACTGCACCAGCAAATCC 

15 [amino mod. C7]-3'(SEQ ID NO:2).; Linker 2,5'- 

TTTCTGCTCGAATTCAAGCTTCTAACGATGTACGGGGACATG-3' (SEQ ID 
NO:3) , 5'-TCCCCGTACATCGTTAGAAGCTTGAATTCGAGCAG[amino mod. 

C7]-3' (SEQ ID NO:4). 

As BsmFI (Tagging Enzyme) cleaves 14 bp away from its recognition site, and 

20 the Nlalll site overlaps the BsmFI site by 1 bp, a 1 5 bp SAGE tag was released with 

BsmFI. SAGE tag overhangs were filled-in with Klenow, and tags from the two pools 
were combined and ligated to each other. The ligation product was diluted and then 
ampHfied with PGR for 28 cycles with 5'-GGATTTGCTGGTGCAGTACA-3' (SEQ 
ID N0:5) and 5'-CTGCTCGAATTCAAGCTTCT-3' (SEQ ID NO:6), as primers. The 

25 PGR product was analyzed by polyacrylamide gel electrophoresis (PAGE), and the 

PGR product containing two tags ligated tail to tail (ditag) was excised. The PGR 
product was then cleaved with Nlalll, and the band containing the ditags was excised 
and self-ligated. After ligation, the concatenated products were separated by PAGE 
and products between 500 bp and 2 kb were excised. These products were cloned into 

30 the SphI site of pZero (Invitrogen). Colonies were screened for inserts by PGR with 

M13 forward and M13 reverse sequences located outside the cloning site as primers. 
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PCR products from selected clones were sequenced with the TaqFS DyePrimer 
kits (Perkin Elmer) and analyzed using a 377 ABI automated sequencer (Perkin 
Elmer), following the manufacturer's protocol. Each successful sequencing reaction 
identified an average of 26 tags; given a 90% sequencing reaction success rate, this 
5 corresponded to an average of about 850 tags per sequencing gel. 



SAGE data analysis 

Sequence files were analyzed by means of the SAGE program group 
(Velculescu et a/., 1995), which identifies the anchoring enzyme site with the proper 
10 spacing and extracts the two intervening tags and records them in a database. The 

68,691 tags obtained contained 62,965 tags from unique ditags and 5,726 tags from 
repeated ditags. The latter were counted only once to eliminate potential PCR bias of 
the quantitation, as described (Velculescu et al, 1995). Of 62,965 tags, 2,332 tags 
corresponded to linker sequences, and were excluded from further analysis. Of the 
1 5 remaining tags, 4,342 tags could not be assigned, and were likely due to sequencing 

errors (in the tags or in the yeast genomic sequence). If all of these were due to tag 
sequencing errors, this corresponds to a sequencing error rate of about 0.7% per base 
pair (for a lObp tag), not far from what we would have expected under our automated 
sequencing conditions. However, some unassigned tags had a much higher than 
20 expected frequency of A's as the last five base pairs of the tag (5 of the 52 most 

abundant unassigned tags), suggesting that these tags were derived from transcripts 
containing anchoring enzyme sites within several base pairs from their polyA tails. 
Given the frequency of Nlalll sites in the genome (one in 309 base pairs), 
approximately 3% of transcripts were predicted to contain Nlalll sites within 10 bp of 
25 their polyA tails. 

As very sparse data are available f(>r yeast mRNA sequences and efforts to date 

V )^ have not been able to identify a highly con^ierved polyadenylation signal (Imiger and 

\ \ \ 

^ Braus 1994; Zaret and Sherman, 1982), we us^ed 14 bp of SAGE tags (i.e. the Nlalll 

site plus the adjacent 10 bp) to search the y^^ast genome directly (yeast genome 

\' ) 

30 sequence obtained from the Stanford yeast genomeftp site (genome-ftp.stanford.edu) 

on August 7, 1996). Because only coding regions are annotated in the yeast genome. 



^ 
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and SAGE tags can be derived 3' untranslated regions of genes, a SAGE tag was 
considered to correspond to a part^ular gene if it matched the ORF or the region 500 
bp 3' of the ORF (locus names, gen\ names and ORF chromosomal coordinates were 
r\ • obtained from Stanford yeast genom)^ ftp site, and ORF descriptions were obtained 



( 5 from MIPS www site (http://www.mips.biochem.mpg.de/) on August 14, 1996). 

ORFs were considered genes with known functions if they were associated with a 
three letter gene name, while ORFs without such designations were considered 
uncharacterized. \ 

As expected, SAGE tags matched transcribed portions of the genome in a highly 
10 non-random fashion, with 88% matching ORFs or their adjacent 3' regions in the 

correct orientation (chi-squared P value <10"^^). In instances when more than one tag 
matched a particular ORF in the correct orientation, the abundance was calculated to 
be the sum of the matched tags. Tags that matched ORFs in the incorrect orientation 
- were not used in abundance calculations. In instances when a tag matched more than 

15 one region of the genome (for example an ORF and non-ORF region) only the 

' matched ORF was considered. In some cases the 15th base of the tag could also be 

used to resolve ambiguities. 

For the identification of NORF genes, only tags were considered that matched 
portions of the genome that were further than 500 bp 3' of a previously identified ORF 
20 and were observed at least two times in the SAGE libraries. 
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# 

Additional NORFs 



SAGE Tag 

GGCGCAATTT 

TAAGTGATGA 

TTGTTGAATT 

GAAGCAGTAA 

ACATATGTTA 

CCCTACACGG 

GTAATTGGAC 

ATCAGACAAA 

TTATGAAAGA 

ATTCGTTCTA 

AGCAGGAGTT 

TTCTATTAGG 

TGGATTTCAG 

CAGATATAAT 

CTGTTTTGGG 

CAI I I I lAGT 

TTGAAAAGAT 

TAAGCCCATC 

AGCGTCCTCA 

TTTAGTTAAT 

ATGGTAGCCA 

AATTAGACTA 

AGTGACTCTT 

GGACTATAAG 

AC I I I I I CAG 

GTCATATAGT 

CAACAAAGTG 

GTGGGAAAGG 

TACTTTATAT 

AATACCAGCG 

GCCTTGTATA 

GGTACATTCA 

GATTTCTCTG 

TAGTTGCTCC 

GTAAGAAATC 

CTTGGGCTAT 

AAATGGTGAT 

ATCATTTGGG 

CTGAACTTTA 

CCAGAAGGAG 

CCGGTTACTA 

CGATGAGAAG 

AAACCGTCCC 

TCATTCATAC 

TATC I I I 1 1 G 

TTAGAATAAT 

GTACGCTGTG 

TATATTAATT 



Seq. ID No. Chr 

97 4 

98 7 

99 10 

100 3 

101 4 

102 6 

103 10 

104 14 

105 15 

106 15 

107 16 

108 2 

109 4 

110 5 

111 11 

112 11 

113 13 

114 13 

115 15 

116 2 

117 3 

118 3 

119 4 

120 5 

121 10 

122 13 

123 13 

124 13 

125 16 

126 3 

127 4 

128 5 

129 5 

130 7 

131 7 

132 8 

133 11 

134 12 

135 12 

136 13 

137 15 

138 15 

139 16 

140 2 

141 4 

142 4 

143 5 

144 6 



Tag Pos Copies/cell 



1108395 


2 


593382 


2 


608373 


2 


155607 


2 


916112 


2 


223289 


2 


392099 


2 


687272 


2 


81263 


2 


841970 


2 


188350 


2 


418749 


2 


1224930 


2 


52488 


2 


374761 


2 


508212 


2 


104160 


2 


251273 


2 


832420 


2 


477623 


2 


56961 


2 


162589 


2 


1490879 


2 


251266 


2 


159213 


2 


158765 


2 


171166 


2 


804600 


2 


366449 


2 


175540 




372624 




67152 




187462 




317108 




836202 




107992 




558686 




199358 




283720 




652873 




803663 




1004369 




199141 




164728 




169784 




603508 




118089 




64228 
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GTTCTTGCCT 


145 


7 


93957Q 




ATATAGCTGC 


146 


10 


181144 




CCAAAAAAAA 


147 


11 


91 78*5 




GAACTCCACA 


148 


11 


94125 




CCTTCACTGC 


149 


11 


374172 




CACATCATAA 


150 


11 


625896 




GAAGTATTGA 


151 


12 


603999 




TGCGCGTATA 


152 


13 


206410 




GGGTAGTACT 


153 


13 


671730 




TAGI 1 1 IGTC 


154 


15 


33475 




CAATTCCTAC 


155 


1 


172182 


0 fi 


TTTGATTTGA 


156 


2 


46431 


0 8 


GGCTCTGGTT 


157 


2 


414510 


0,8 


CAGAAATAGC 


158 


2 


565130 


0.8 


CTGTTAI 1 1 1 


159 


2 


616054 


0.8 


CGAAGTCAAA 


160 


2 


680605 


0 8 


CTCTAGATAA 


161 


3 


171584 


0.8 


AGTCAAAATG 


162 


4 


192750 


0.8 


GCGAGTTTAG 


163 


4 


691301 


0 8 


GCTCCAATAG 


164 


4 


1131020 


0 ft 


TTTATTTGAG 


165 


4 


1237501 


0 ft 


GTTATATTGA 


166 


4 


1401803 


0 ft 


TGGGTTGAAG 


167 


5 




n ft 


Al 1 1 lATTTG 


168 


5 




0 ft 


ATCATAAAAA 


169 


5 


548612 


0 ft 


TTATATAAAA 


170 


6 


2231 82 


0 ft 


CTACTTCTGC 


171 


8 




0 ft 


ATAAGACAGT 


172 


10 


227802 


0 ft 


TTCATAAGTT 


173 


10 


471804 


n ft 


TAAATCTGAG 


174 


11 


145617 


n ft 


CTGGTAGAAA 


175 


11 


151174 


n ft 


CACGTACACA 


176 


11 


403208 


0 8 


CCAAGATCAA 


177 


11 


425882 


0 8 


AGCTTGTTCC 


178 


12 


234966 


0 8 


CACATTCGTT 


179 


12 


759953 


0 ft 


CTTACATATA 


180 


12 


789781 


0 ft 


TCTATAGCAA 


181 


13 


228Q36 


n ft 


CCTTTCTGAA 


182 


13 


29798*1 


n ft 


CCTTTAGAAT 


183 


13 


777999 


0 ft 


AATTAACACC 


184 


13 


842122 


0 ft 


GCGCAGGGGC 


185 


14 


440Q84 


n ft 


TGI 1 lATAAA 


186 


14 


661710 


n ft 


AAAAGTCATT 


187 


15 


39nR1 


n ft 


TTCGTAAACT 


188 


15 


680625 


0 ft 


1 1 1 1 1 GGAGT 


189 


15 


888343 


0 8 


AGGCATCTTG 


190 


16 


250284 


0.8 


AAATCAAAAC 


191 


16 


453890 


0.8 


AATTGACGAA 


192 


16 


560169 


0.8 


TTGATGATTT 


193 


16 


582360 


0.8 


CCTG 1 1 1 1 1 G 


194 


16 


643476 


0.8 


TIM I AAAAA 


195 


1 


101436 


0.5 
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H QQOil O 

iyyo4o 


u.o 


AGCACCTATG 


1Q7 


o 

c. 


4oyi*3 


u.o 


TGAI I l ATCC 


1 v7vJ 


c. 


41 oy4D 


u.o 


ACTGCATCTG 


1 


o 

£. 


DOUOOU 


u.o 


CAAGTTAGGA 




O 
£. 




O R 

u.o 


ATACCCAATT 


1 


O 


4yyoy 


u.o 


AACTTTGTAT 




O 

o 


oUUoo 


u.o 


GCGGCGGGTG 




o 


A i ft>1C^ 


u.o 


AAAATTGTTC 




o 


0/1 Uo 


U.o 


TCAAGTACTC 




o 




u.o 


AACTGTATGC 




Q 

o 


4ZOC504 


u.o 


CTATCGGCCA 


907 


o 
o 


it / 004U 


u.o 


ACAAGCCCAA 


90R 


O 

o 


OOQQI 7 

4oyy 1 / 


u.o 


GTACAGGGCT 


90Q 




yoo/ o 


u.o 


AAGATCATCG 


910 


A 


^D400n 


u.o 


GAACTCCTGG 


91 1 




o4uoyi 


u.o 


GAACGAGAAG 


912 
^ 1 ^ 


*f 


07ip(:n 
Of 1 ODU 


u.o 


1 1 1 1 lAATAC 


91*^ 


*v 


o7onf;Q 
O/ zuoo 


u.o 


TCTCCAGTTG 


914 


>* 
*♦ 


'JQ'I 7-1 O 


u.o 


AATACGTTAC 


91 *=; 




>47'l 7QH 

4/1 /yi 


U.O 


ACGATTGGCT 


91R 


4 


DUyiDo 


U.O 


TGTTTATAAG 


917 


4 


KO -1 7nQ 

o4i /uy 


U.O 


CGI i 1 ICGTC 


91R 


4 


oocJooy 


U.O 


TCGAACCTCT 


91Q 


4 


Of Of U4 


U.o 


TCCACACACA 


990 


A 
4 


youy f4 


U.o 


CCGTGCGTGC 


991 


4 


1 0440D / 


U.O 


TTTCTTCAAC 


999 


O 


1 1 uuy y 


U.O 


CCAAGTCTCG 


c^o 


c 
O 


1 oyo4u 


U.O 


AGAGCGAATT 


994 


0 


On7C'l 7 


U.O 


TGTAGATTAT 


99*^ 


c 
O 


4oL>4oo 


U.O 


AAAAGTAGTT 


99R 


D 


OOCOQ7 

Zoooo/ 


U.O 


ACTTGGTATG 


997 


0 


444 y44 


U.O 


TTAATGTTAT 

1 1 t^V \ 1 \J 1 1 rA i 


99R 


O 


C>(yl coo 

D44040 


U.O 


TACACGCGCG 


99Q 


c 
O 


D44000 


U.O 


GGTCACTCCT 


9'^0 


O 


eooQO 
b4yoo 


U.O 


AAGTGATGAA 


9*^1 


D 


7ft -I /* 
/ Dl41 


U.O 


TTTATCTTGT 

1 1 1 ^A 1 \y \ ( 1 


9*^9 
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*i onoo7 
1 oUo4 / 
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AGTGATTGTT 




D 


400440 


u.o 


GCTTTGTTGT 

^>^\b/ III >^ 1 1 1 




7 


70C77 
f ^Of f 


U.O 


TCATTGATTC 




( 


1 1 uoyu 


U.O 


TTCACCGGAA 


9*^ 


•J 


OOOftCC 

o4oooo 


U.O 


ACTATTCTGT 


9'^7 


7 

f 


44oy 0 / 


U.O 


GGGCCAACCC 




7 

f 


><007Q7 


U.o 


AAAATATCTT 




7 


ooyoy / 


U.O 


TAGTAGTAAC 


940 


7 
/ 


D444U1 


U.O 


AAGCGCACAA 


941 


7 


7*5*^000 

/ ooyuy 


U.o 


TCGCTG 1 1 I 1 


242 


7 


800300 


0.5 


TGTAI 1 1 1 iG 


243 


7 


836202 


0.5 


CTAAACAAAG 


244 


7 


836587 


0.5 


TAGGAAGAAA 


245 


7 


905046 


0.5 


GGAAAAATTA 


246 


7 


958839 


0.5 




TTTGGATAGT 


247 


7 




\J.\J 


CGTTTGTGTA 


248 


8 






AGAAAAAAAC 


249 


8 




n 5 


TAAAGTCCAG 


250 


8 




0 5 


TAAGCAGATT 


251 


8 




0 5 


ATGAGCATTT 


252 


9 


Q71 14 


n 5 


AGGTGCAAAA 


253 


9 


229077 


0 5 


TAACAAAGAG 


254 


10 




n 5 


CAATTGGCAA 


255 


10 


7217fi1 


n 5 


ACTCCCTGTA 


256 


11 




n 5 


CTCTATTGAT 


257 


1 1 




n 5 


GCTTTCCTTT 


258 


1 1 




0 5 


ACCGCAAAGA 


259 


1 1 




n 5 


CTTGTTCAAA 


260 


12 




n 5 


AATGTGCTGT 


261 


12 


OfaV/*TfaVJ 


n 5 


GCAGATAGCG 


262 


12 


341*^24 


n 5 


TCTGACTTAG 


263 


12 




n 5 


CCCGGATGTT 


264 


12 


•tOOO I fa 


n 5 


GTAACGATTG 


265 


12 


44QQ17 


n 5 


GAATAACGAA 


266 


12 




n 5 


ACTGCTATTT 


267 


12 


/ 1 fa*T f \J 


n 5 


GTTCTCTAGC 


268 


12 


719719 
/ ifa/ 1^ 


n 5 


CATCACCATC 


269 


12 


70471 n 


n 5 


TTGCACTTCT 


270 


12 


OVJUOOO 




ACTGTTTATG 


271 


12 


Rfi7'^'in 

w\J r \J\J\J 


n 5 


TTGCTATATA 


272 


12 


ini7Qi 1 


n 5 


TACATTCTAA 


273 


13 




n 


CTCTTAGTTG 


274 


13 






ACGAACACTT 


275 


13 


fa / 00*T 1 


u.o 


TGCGCAAGTC 


276 


13 




n 5 


1 1 1 1 ICTTAA 


277 


13 




n 5 


CAAATGCATT 


278 


13 


390802 


n 5 


CAAATTGTGT 


279 


13 




n 5 


GCAATACTAT 


280 


13 


826*521 
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